From Fixed-X to Random-X Regression: Bias-Variance Decompositions, Covariance Penalties, and Prediction Error Estimation

نویسندگان

  • Saharon Rosset
  • Ryan J. Tibshirani
چکیده

In the field of statistical prediction, the tasks of model selection and model evaluation have received extensive treatment in the literature. Among the possible approaches for model selection and evaluation are those based on covariance penalties, which date back to at least 1960s, and are still widely used today. Most of the literature on this topic is based on what we call the “Fixed-X” assumption, where covariate values are assumed to be nonrandom. By contrast, in most modern predictive modeling applications, it is more reasonable to take a “Random-X” view, where the covariate values (both those used in training and for future predictions) are random. In the current work, we study the applicability of covariance penalties in the RandomX setting. We propose a decomposition of Random-X prediction error in which the randomness in the covariates has contributions to both the bias and variance components of the error decomposition. This decomposition is general, and for concreteness, we examine it in detail in the fundamental case of least squares regression. We prove that, for the least squares estimator, the move from Fixed-X to Random-X prediction always results in an increase in both the bias and variance components of the prediction error. When the covariates are normally distributed and the linear model is unbiased, all terms in this decomposition are explicitly computable, which leads us to propose an extension of Mallows’ Cp (Mallows, 1973) that we call RCp. While RCp provides an unbiased estimate of Random-X prediction error for normal covariates, we also show using standard random matrix theory that it is asymptotically unbiased for certain classes of nonnormal covariates. When the noise variance is unknown, plugging in the usual unbiased estimate leads to an approach that we call R̂Cp, which turns out to be closely related to the existing methods Sp (Tukey, 1967; Hocking, 1976), and GCV (generalized crossvalidation, Craven and Wahba 1978; Golub et al. 1979). As for the excess bias, we propose an estimate based on the well-known “shortcut-formula” for ordinary leave-one-out cross-validation (OCV), resulting in a hybrid approach we call RCp. We give both theoretical arguments and numerical simulations to demonstrate that this approach is typically superior to OCV, though the difference is usually small. Lastly, we examine the excess bias and excess variance of other estimators, namely, ridge regression and some common estimators for nonparametric regression. The surprising result we get for ridge is that, in the heavily-regularized regime, the Random-X prediction variance is guaranteed to be smaller than the Fixed-X variance, which can even lead to smaller overall Random-X prediction error.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Application of Nonlinear Mixed-Effects Modeling Approach in Tree Height Prediction

A nonlinear mixed-effects modeling approach was used to model the individual tree height–diameter relationship based on Chapman-Richards function for dahurian larch (Larix gmelinii. Rupr.) plantations in northeastern China. The study involved the estimation of fixed and random parameters, as well as procedures for determining random effects variance-covariance matrices to reduce the number of t...

متن کامل

Estimation of Variance Components for Body Weight of Moghani Sheep Using B-Spline Random Regression Models

The aim of the present study was the estimation of (co) variance components and genetic parameters for body weight of Moghani sheep, using random regression models based on B-Splines functions. The data set included 9165 body weight records from 60 to 360 days of age from 2811 Moghani sheep, collected between 1994 to 2013 from Jafar-Abad Animal Research and Breeding Institute, Ardabil province,...

متن کامل

Moment-based Method for Random Effects Selection in Linear Mixed Models.

The selection of random effects in linear mixed models is an important yet challenging problem in practice. We propose a robust and unified framework for automatically selecting random effects and estimating covariance components in linear mixed models. A moment-based loss function is first constructed for estimating the covariance matrix of random effects. Two types of shrinkage penalties, a h...

متن کامل

Multivariate Regression Estimation : Local Polynomial Fitting for Time Series

We consider the estimation of the multivariate regression function m (x 1 , . . . ,xd) = E [ψ (Yd) | X 1 = x 1 , . . . ,Xd = xd], and its partial derivatives, for stationary random processes {Yi ,Xi} using local higher-order polynomial fitting. Particular cases of ψ yield estimation of the conditional mean, conditional moments and conditional distributions. Joint asymptotic normality is establi...

متن کامل

Compressed Least-Squares Regression on Sparse Spaces

Recent advances in the area of compressed sensing suggest that it is possible to reconstruct high-dimensional sparse signals from a small number of random projections. Domains in which the sparsity assumption is applicable also offer many interesting large-scale machine learning prediction tasks. It is therefore important to study the effect of random projections as a dimensionality reduction m...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2017